Revert "fix(trtllm_mha): clamp page_table to k_cache page range" by pyc96 · Pull Request #7 · pyc96/sglang

pyc96 · 2026-05-23T07:04:18Z

Summary

Revert the defensive clamp from #5. It is no longer
needed now that #6 (Patch E) eliminates the source of OOB
page_table values entirely.

Stacked on #6.

Why revert

The clamp was a safety net that traded correctness for liveness:
when it actually triggered, it replaced an OOB page index with the
LAST valid SWA page, producing slightly wrong attention values for
that position and lowering MTP draft acceptance (1.67 instead of
2.84 on 26B summarization).

Patch E (#6) fixes the actual root cause — the
FROZEN_KV_MTP draft backend not swapping its SWA-aware attributes
when token_to_kv_pool is swapped to the target's SWA pool — so
OOB values never occur and the clamp never fires. It's now dead
code that adds one .clamp() per kernel call for no benefit.

Verification (Gemma-4-E4B-IT, trtllm_mha, MTP, summarization 8 k/1 k × 80, 1× B200)

metric	Patch E (with clamp)	After revert (Patch E only)
outcome	OK	OK
trap events (PR #3)	0	0
output tok/s	4022	4049 (within noise)
accept length	2.13	2.13 (identical)
TPOT median	9.99 ms	9.94 ms
benchmark duration	19.89 s	19.76 s

Same results. The clamp truly was dead code.

Safety net

If a future code change ever reintroduces an OOB page_table value,
the opt-in bounds-check trap from #3
(SGLANG_TRTLLM_MHA_DEBUG=1) will still catch it with a
deterministic Python exception + dump for triage.

CI States

Latest PR Test (Base): ❌ Missing run-ci label -- add it to run CI tests.
Latest PR Test (Extra): ❌ Blocked -- run-ci is required first.

This reverts commit 5547e41. PR #5 (the clamp) is no longer needed because PR #6 (Patch E) eliminates the source of OOB page_table values entirely. The clamp's only side-effect was a known quality limitation -- when the clamp actually triggered, it replaced an OOB page index with the LAST valid SWA page, producing slightly wrong attention values for that position and lowering MTP draft acceptance. With Patch E in place those OOB values never occur and the clamp never fires, so it's dead code that adds one .clamp() per kernel call for no benefit. Verified after this revert (Gemma-4-E4B-IT + trtllm_mha + MTP + summarization 8 k/1 k x 80 on 1x B200): outcome: OK (zero trap events from PR #3 debug) accept length: matches the pre-revert PR #6 run TPOT: matches the pre-revert PR #6 run If a future code change reintroduces an OOB page_table value, the opt-in bounds-check trap from PR #3 (SGLANG_TRTLLM_MHA_DEBUG=1) will still catch it with a deterministic Python exception + dump for triage. Co-authored-by: Claude

github-actions Bot added the blackwell label May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "fix(trtllm_mha): clamp page_table to k_cache page range"#7

Revert "fix(trtllm_mha): clamp page_table to k_cache page range"#7
pyc96 wants to merge 1 commit into
pyc/sota-gemma4-mtp-patch-e-rootcausefrom
pyc/sota-gemma4-mtp-revert-clamp

pyc96 commented May 23, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pyc96 commented May 23, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why revert

Verification (Gemma-4-E4B-IT, trtllm_mha, MTP, summarization 8 k/1 k × 80, 1× B200)

Safety net

CI States

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pyc96 commented May 23, 2026 •

edited by github-actions Bot

Loading